Chapter 8 - Exponential smoothing
R. Hyndman/R. J. Serrano
7/30/2022
Exponential smoothing
Historical perspective
- Developed in the 1950s and 1960s as methods (algorithms) to produce
point forecasts.
- Combine a “level”, “trend” (slope) and “seasonal” component to
describe a time series.
- The rate of change of the components are controlled by “smoothing
parameters”: \(\alpha\), \(\beta\) and \(\gamma\) respectively.
- Need to choose best values for the smoothing parameters (and initial
states).
- Equivalent ETS state space models developed in the 1990s and
2000s.
Big idea: control the rate of change
\(\alpha\) controls the flexibility
of the level
- If \(\alpha = 0\), the level never
updates (mean)
- If \(\alpha = 1\), the level
updates completely (naive)
\(\beta\) controls the flexibility
of the trend
- If \(\beta = 0\), the trend is
linear
- If \(\beta = 1\), the trend changes
suddenly every observation
\(\gamma\) controls the flexibility
of the seasonality
- If \(\gamma = 0\), the seasonality
is fixed (seasonal means)
- If \(\gamma = 1\), the seasonality
updates completely (seasonal naive)
A model for levels, trends, and seasonalities
We want a model that captures the level (\(\ell_t\)), trend (\(b_t\)) and seasonality (\(s_t\)).
ETS models
Additive ("A") or multiplicative ("M")
None ("N"), additive ("A"), multiplicative
("M"), or damped ("Ad" or "Md").
None ("N"), additive ("A") or
multiplicative ("M")
Simple exponential smoothing
Simple methods
Time series \(y_1,y_2,\dots,y_T\).
- Want something in between these methods.
- Most recent data should have more weight.
Simple Exponential Smoothing
Simple Exponential Smoothing
Simple Exponential Smoothing
- \(\ell_t\) is the level (or the
smoothed value) of the series at time t.
- \({y}{t+1}{t} = \alpha y_t + (1-\alpha)
{y}{t}{t-1}\)
Iterate to get exponentially weighted moving average form.
Optimising smoothing parameters
- Need to choose best values for \(\alpha\) and \(\ell_0\).
- Similarly to regression, choose optimal parameters by minimising
SSE: \[
\text{SSE}=\sum_{t=1}^T(y_t - {y}{t}{t-1})^2.
\]
- Unlike regression there is no closed form solution — use numerical
optimization.
- For Algerian Exports example:
- \(\hat\alpha = 0.8400\)
- \(\hat\ell_0 = 39.54\)
Simple Exponential Smoothing

Models and methods
Methods
- Algorithms that return point forecasts.
Models
- Generate same point forecasts but can also generate forecast
distributions.
- A stochastic (or random) data generating process that can generate
an entire forecast distribution.
- Allow for “proper” model selection.
ETS(A,N,N): SES with additive errors
Forecast error:
\(e_t = y_t - {y}{t}{t-1} =
y_t - \ell_{t-1}\).
Specify probability distribution for \(e_t\), we assume \(e_t =
\varepsilon_t\sim\text{NID}(0,\sigma^2)\).
ETS(A,N,N): SES with additive errors
where \(\varepsilon_t\sim\text{NID}(0,\sigma^2)\).
- “innovations” or “single source of error” because equations have the
same error process, \(\varepsilon_t\).
- Measurement equation: relationship between observations and
states.
- State equation(s): evolution of the state(s) through time.
ETS(M,N,N): SES with multiplicative errors.
- Specify relative errors \(\varepsilon_t=\frac{y_t-{y}{t}{t-1}}{{y}{t}{t-1}}\sim
\text{NID}(0,\sigma^2)\)
- Substituting \({y}{t}{t-1}=\ell_{t-1}\) gives:
- \(y_t =
\ell_{t-1}+\ell_{t-1}\varepsilon_t\)
- \(e_t = y_t - {y}{t}{t-1} =
\ell_{t-1}\varepsilon_t\)
- Models with additive and multiplicative errors with the same
parameters generate the same point forecasts but different prediction
intervals.
ETS(A,N,N): Specifying the model
ETS(y ~ error("A") + trend("N") + season("N"))
By default, an optimal value for \(\alpha\) and \(\ell_0\) is used.
\(\alpha\) can be chosen manually in
trend().
trend("N", alpha = 0.5)
trend("N", alpha_range = c(0.2, 0.8))
Example: Algerian Exports
algeria_economy <- global_economy %>%
filter(Country == "Algeria")
fit <- algeria_economy %>%
model(ANN = ETS(Exports ~ error("A") + trend("N") + season("N")))
report(fit)
## Series: Exports
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.84
##
## Initial states:
## l[0]
## 39.5
##
## sigma^2: 35.6
##
## AIC AICc BIC
## 447 447 453
Example: Algerian Exports
components(fit) %>% autoplot()

Example: Algerian Exports
components(fit) %>%
left_join(fitted(fit), by = c("Country", ".model", "Year"))
Example: Algerian Exports
fit %>%
forecast(h = 5) %>%
autoplot(algeria_economy) +
labs(y = "% of GDP", title = "Exports: Algeria")

Models with trend
Holt’s linear trend
- Two smoothing parameters \(\alpha\)
and \(\beta^*\) (\(0\le\alpha,\beta^*\le1\)).
- \(\ell_t\) level: weighted average
between \(y_t\) and one-step ahead
forecast for time \(t\), \((\ell_{t-1} + b_{t-1}={y}{t}{t-1})\)
- \(b_t\) slope: weighted average of
\((\ell_{t} - \ell_{t-1})\) and \(b_{t-1}\), current and previous estimate of
slope.
- Choose \(\alpha, \beta^*, \ell_0,
b_0\) to minimise SSE.
ETS(A,A,N)
Holt’s linear method with additive errors.
- Assume \(\varepsilon_t=y_t-\ell_{t-1}-b_{t-1} \sim
\text{NID}(0,\sigma^2)\).
- Substituting into the error correction equations for Holt’s linear
method \[\begin{align*}
y_t&=\ell_{t-1}+b_{t-1}+\varepsilon_t\\
\ell_t&=\ell_{t-1}+b_{t-1}+\alpha \varepsilon_t\\
b_t&=b_{t-1}+\alpha\beta^* \varepsilon_t
\end{align*}\]
- For simplicity, set \(\beta=\alpha
\beta^*\).
Exponential smoothing: trend/slope
ETS(M,A,N)
Holt’s linear method with multiplicative errors.
- Assume \(\varepsilon_t=\frac{y_t-(\ell_{t-1}+b_{t-1})}{(\ell_{t-1}+b_{t-1})}\)
- Following a similar approach as above, the innovations state space
model underlying Holt’s linear method with multiplicative errors is
specified as \[\begin{align*}
y_t&=(\ell_{t-1}+b_{t-1})(1+\varepsilon_t)\\
\ell_t&=(\ell_{t-1}+b_{t-1})(1+\alpha \varepsilon_t)\\
b_t&=b_{t-1}+\beta(\ell_{t-1}+b_{t-1}) \varepsilon_t
\end{align*}\] where again \(\beta=\alpha \beta^*\) and \(\varepsilon_t \sim
\text{NID}(0,\sigma^2)\).
ETS(A,A,N): Specifying the model
ETS(y ~ error("A") + trend("A") + season("N"))
By default, optimal values for \(\beta\) and \(b_0\) are used.
\(\beta\) can be chosen manually in
trend().
trend("A", beta = 0.004)
trend("A", beta_range = c(0, 0.1))
Example: Australian population
aus_economy <- global_economy %>% filter(Code == "AUS") %>%
mutate(Pop = Population / 1e6)
fit <- aus_economy %>%
model(AAN = ETS(Pop ~ error("A") + trend("A") + season("N")))
report(fit)
## Series: Pop
## Model: ETS(A,A,N)
## Smoothing parameters:
## alpha = 1
## beta = 0.327
##
## Initial states:
## l[0] b[0]
## 10.1 0.222
##
## sigma^2: 0.0041
##
## AIC AICc BIC
## -77.0 -75.8 -66.7
Example: Australian population
components(fit) %>% autoplot()

Example: Australian population
components(fit) %>%
left_join(fitted(fit), by = c("Country", ".model", "Year"))
Example: Australian population
fit %>%
forecast(h = 10) %>%
autoplot(aus_economy) +
labs(y = "Millions", title = "Population: Australia")

Damped trend method
- Damping parameter \(0<\phi<1\).
- If \(\phi=1\), identical to Holt’s
linear trend.
- As \(h\rightarrow\infty\), \({y}{T+h}{T}\rightarrow \ell_T+\phi
b_T/(1-\phi)\).
- Short-run forecasts trended, long-run forecasts constant.
Example: Australian population
aus_economy %>%
model(holt = ETS(Pop ~ error("A") + trend("Ad") + season("N"))) %>%
forecast(h = 20) %>%
autoplot(aus_economy)

Example: Australian population
fit <- aus_economy %>%
filter(Year <= 2010) %>%
model(
ses = ETS(Pop ~ error("A") + trend("N") + season("N")),
holt = ETS(Pop ~ error("A") + trend("A") + season("N")),
damped = ETS(Pop ~ error("A") + trend("Ad") + season("N"))
)
Example: Australian population
| \(\alpha\) |
1.00 |
1.00 |
1.00 |
| \(\beta^*\) |
|
0.30 |
0.40 |
| \(\phi\) |
|
|
0.98 |
| NA |
|
0.22 |
0.25 |
| NA |
10.28 |
10.05 |
10.04 |
| Training RMSE |
0.24 |
0.06 |
0.07 |
| Test RMSE |
1.63 |
0.15 |
0.21 |
| Test MASE |
6.18 |
0.55 |
0.75 |
| Test MAPE |
6.09 |
0.55 |
0.74 |
| Test MAE |
1.45 |
0.13 |
0.18 |
Models with seasonality
Holt-Winters additive method
Holt and Winters extended Holt’s method to capture seasonality.
- \(k=\) integer part of \((h-1)/m\). Ensures estimates from the final
year are used for forecasting.
- Parameters: \(0\le \alpha\le 1\),
\(0\le \beta^*\le 1\), \(0\le \gamma\le 1-\alpha\) and \(m=\) period of seasonality (e.g. \(m=4\) for quarterly data).
Holt-Winters additive method
- Seasonal component is usually expressed as \(s_{t} = \gamma^* (y_{t}-\ell_{t})+
(1-\gamma^*)s_{t-m}.\)
- Substitute in for \(\ell_t\): \(s_{t} = \gamma^*(1-\alpha)
(y_{t}-\ell_{t-1}-b_{t-1})+ [1-\gamma^*(1-\alpha)]s_{t-m}\)
- We set \(\gamma=\gamma^*(1-\alpha)\).
- The usual parameter restriction is \(0\le\gamma^*\le1\), which translates to
\(0\le\gamma\le(1-\alpha)\).
Exponential smoothing: seasonality
ETS(A,A,A)
Holt-Winters additive method with additive errors.
- Forecast errors: \(\varepsilon_{t} = y_t -
\hat{y}_{t|t-1}\)
- \(k\) is integer part of \((h-1)/m\).
Holt-Winters multiplicative method
Seasonal variations change in proportion to the level of the
series.
- \(k\) is integer part of \((h-1)/m\).
- Additive method: \(s_t\) in
absolute terms — within each year \(\sum_i s_i
\approx 0\).
- Multiplicative method: \(s_t\) in
relative terms — within each year \(\sum_i s_i
\approx m\).
ETS(M,A,M)
Holt-Winters multiplicative method with multiplicative errors.
- Forecast errors: \(\varepsilon_{t} = (y_t
- \hat{y}_{t|t-1})/\hat{y}_{t|t-1}\)
- \(k\) is integer part of \((h-1)/m\).
Example: Australian holiday tourism
aus_holidays <- tourism %>%
filter(Purpose == "Holiday") %>%
summarise(Trips = sum(Trips))
fit <- aus_holidays %>%
model(
additive = ETS(Trips ~ error("A") + trend("A") + season("A")),
multiplicative = ETS(Trips ~ error("M") + trend("A") + season("M"))
)
fc <- fit %>% forecast()
Example: Australian holiday tourism
fc %>%
autoplot(aus_holidays, level = NULL) +
labs(y = "Thousands", title = "Overnight trips")

Estimated components

Holt-Winters damped method
Often the single most accurate forecasting method for seasonal data:
Holt-Winters with daily data
sth_cross_ped <- pedestrian %>%
filter(
Date >= "2016-07-01",
Sensor == "Southern Cross Station"
) %>%
index_by(Date) %>%
summarise(Count = sum(Count) / 1000)
sth_cross_ped %>%
filter(Date <= "2016-07-31") %>%
model(
hw = ETS(Count ~ error("M") + trend("Ad") + season("M"))
) %>%
forecast(h = "2 weeks") %>%
autoplot(sth_cross_ped %>% filter(Date <= "2016-08-14")) +
labs(
title = "Daily traffic: Southern Cross",
y = "Pedestrians ('000)"
)
Holt-Winters with daily data

Innovations state space models
Exponential smoothing methods
ETS models
Additive error models
Multiplicative error models
Estimating ETS models
- Smoothing parameters \(\alpha\),
\(\beta\), \(\gamma\) and \(\phi\), and the initial states \(\ell_0\), \(b_0\), \(s_0,s_{-1},\dots,s_{-m+1}\) are estimated
by maximising the “likelihood” = the probability of the data arising
from the specified model.
- For models with additive errors equivalent to minimising SSE.
- For models with multiplicative errors, equivalent to minimising
SSE.
Innovations state space models
Let
\(\bm{x}_t = (\ell_t, b_t, s_t, s_{t-1},
\dots, s_{t-m+1})\) and
\(\varepsilon_t\stackrel{\mbox{\scriptsize
iid}}{\sim} \mbox{N}(0,\sigma^2)\).
- Additive errors
-
\(k(x)=1\).\(y_t = \mu_{t} + \varepsilon_t\).
- Multiplicative errors
-
\(k(\bm{x}_{t-1}) = \mu_{t}\).\(y_t = \mu_{t}(1 + \varepsilon_t)\). \(\varepsilon_t = (y_t - \mu_t)/\mu_t\) is
relative error.
Innovations state space models
- Estimate parameters \(\bm\theta =
(\alpha,\beta,\gamma,\phi)\) and initial states \(\bm{x}_0 =
(\ell_0,b_0,s_0,s_{-1},\dots,s_{-m+1})\) by minimizing \(L^*\).
Parameter restrictions
Usual region
- Traditional restrictions in the methods \(0<
\alpha,\beta^*,\gamma^*,\phi<1\)(equations interpreted as
weighted averages).
- In models we set \(\beta=\alpha\beta^*\) and \(\gamma=(1-\alpha)\gamma^*\).
- Therefore \(0< \alpha <1\),
\(0 < \beta < \alpha\) and
\(0< \gamma < 1-\alpha\).
- \(0.8<\phi<0.98\) — to
prevent numerical difficulties.
Admissible region
- To prevent observations in the distant past having a continuing
effect on current forecasts.
- Usually (but not always) less restrictive than region.
- For example for ETS(A,N,N): \(0< \alpha
<1\) while \(0< \alpha
<2\).
Model selection
where \(L\) is the likelihood and
\(k\) is the number of parameters
initial states estimated in the model.
which is the AIC corrected (for small sample bias).
AIC and cross-validation
Automatic forecasting
From Hyndman et al. (IJF, 2002):
- Apply each model that is appropriate to the data. Optimize
parameters and initial values using MLE (or some other criterion).
- Select best method using AICc:
- Produce forecasts using best method.
- Obtain forecast intervals using underlying state space model.
Method performed very well in M3 competition.
Example: National populations
fit <- global_economy %>%
mutate(Pop = Population / 1e6) %>%
model(ets = ETS(Pop))
fit
Example: National populations
Example: Australian holiday tourism
holidays <- tourism %>%
filter(Purpose == "Holiday")
fit <- holidays %>% model(ets = ETS(Trips))
fit
Example: Australian holiday tourism
fit %>%
filter(Region == "Snowy Mountains") %>%
report()
## Series: Trips
## Model: ETS(M,N,A)
## Smoothing parameters:
## alpha = 0.157
## gamma = 1e-04
##
## Initial states:
## l[0] s[0] s[-1] s[-2] s[-3]
## 142 -61 131 -42.2 -27.7
##
## sigma^2: 0.0388
##
## AIC AICc BIC
## 852 854 869
Example: Australian holiday tourism
fit %>%
filter(Region == "Snowy Mountains") %>%
components(fit)
Example: Australian holiday tourism
fit %>%
filter(Region == "Snowy Mountains") %>%
components(fit) %>%
autoplot()

Example: Australian holiday tourism
Example: Australian holiday tourism
fit %>% forecast() %>%
filter(Region == "Snowy Mountains") %>%
autoplot(holidays) +
labs(y = "Thousands", title = "Overnight trips")

Residuals
Response residuals
\[\hat{e}_t = y_t -
\hat{y}_{t|t-1}\]
Innovation residuals
Additive error model: \[\hat\varepsilon_t
= y_t - \hat{y}_{t|t-1}\]
Multiplicative error model: \[\hat\varepsilon_t = \frac{y_t -
\hat{y}_{t|t-1}}{\hat{y}_{t|t-1}}\]
Example: Australian holiday tourism
aus_holidays <- tourism %>%
filter(Purpose == "Holiday") %>%
summarise(Trips = sum(Trips))
fit <- aus_holidays %>%
model(ets = ETS(Trips)) %>%
report()
## Series: Trips
## Model: ETS(M,N,M)
## Smoothing parameters:
## alpha = 0.358
## gamma = 0.000969
##
## Initial states:
## l[0] s[0] s[-1] s[-2] s[-3]
## 9667 0.943 0.927 0.968 1.16
##
## sigma^2: 0.0022
##
## AIC AICc BIC
## 1331 1333 1348
Example: Australian holiday tourism
residuals(fit)
residuals(fit, type = "response")

Example: Australian holiday tourism
Some unstable models
- Some of the combinations of (Error, Trend, Seasonal) can lead to
numerical difficulties; see equations with division by a state.
- These are: ETS(A,N,M), ETS(A,A,M), ETS(A,A,M).
- Models with multiplicative errors are useful for strictly positive
data, but are not numerically stable with data containing zeros or
negative values. In that case only the six fully additive models will be
applied.
Exponential smoothing models
Forecasting with exponential smoothing
Forecasting with ETS models
iterate the equations for \(t=T+1,T+2,\dots,T+h\) and set all \(\varepsilon_t=0\) for \(t>T\).
- Not the same as \(\text{E}(y_{t+h} |
\bm{x}_t)\) unless seasonality is additive.
fable uses \(\text{E}(y_{t+h}
| \bm{x}_t)\).
- Point forecasts for ETS(A,*,*) are identical to ETS(M,*,*) if the
parameters are the same.
Example: ETS(A,A,N)
\[\begin{align*}
y_{T+1} &= \ell_T + b_T + \varepsilon_{T+1}\\
\hat{y}_{T+1|T} & = \ell_{T}+b_{T}\\
y_{T+2} & = \ell_{T+1} + b_{T+1} + \varepsilon_{T+2}\\
& =
(\ell_T + b_T + \alpha\varepsilon_{T+1}) +
(b_T + \beta \varepsilon_{T+1}) +
\varepsilon_{T+2} \\
\hat{y}_{T+2|T} &= \ell_{T}+2b_{T}
\end{align*}\] etc.
Example: ETS(M,A,N)
\[\begin{align*}
y_{T+1} &= (\ell_T + b_T )(1+ \varepsilon_{T+1})\\
\hat{y}_{T+1|T} & = \ell_{T}+b_{T}.\\
y_{T+2} & = (\ell_{T+1} + b_{T+1})(1 + \varepsilon_{T+2})\\
& = \left\{
(\ell_T + b_T) (1+ \alpha\varepsilon_{T+1}) +
\left[b_T + \beta (\ell_T +
b_T)\varepsilon_{T+1}\right]
\right\}
(1 + \varepsilon_{T+2}) \\
\hat{y}_{T+2|T} &= \ell_{T}+2b_{T}
\end{align*}\] etc.
Forecasting with ETS models
can only be generated using the models.
- The prediction intervals will differ between models with additive
and multiplicative errors.
- Exact formulae for some models.
- More general to simulate future sample paths, conditional on the
last estimate of the states, and to obtain prediction intervals from the
percentiles of these simulated future paths.
Prediction intervals
Example: Corticosteroid drug sales
h02 <- PBS %>%
filter(ATC2 == "H02") %>%
summarise(Cost = sum(Cost))
h02 %>% autoplot(Cost)

Example: Corticosteroid drug sales
h02 %>%
model(ETS(Cost)) %>%
report()
## Series: Cost
## Model: ETS(M,Ad,M)
## Smoothing parameters:
## alpha = 0.307
## beta = 0.000101
## gamma = 0.000101
## phi = 0.978
##
## Initial states:
## l[0] b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6] s[-7] s[-8] s[-9]
## 417269 8206 0.872 0.826 0.756 0.773 0.687 1.28 1.32 1.18 1.16 1.1
## s[-10] s[-11]
## 1.05 0.981
##
## sigma^2: 0.0046
##
## AIC AICc BIC
## 5515 5519 5575
Example: Corticosteroid drug sales
h02 %>%
model(ETS(Cost ~ error("A") + trend("A") + season("A"))) %>%
report()
## Series: Cost
## Model: ETS(A,A,A)
## Smoothing parameters:
## alpha = 0.17
## beta = 0.00631
## gamma = 0.455
##
## Initial states:
## l[0] b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6] s[-7]
## 409706 9097 -99075 -136602 -191496 -174531 -241437 210644 244644 145368
## s[-8] s[-9] s[-10] s[-11]
## 130570 84458 39132 -11674
##
## sigma^2: 3.5e+09
##
## AIC AICc BIC
## 5585 5589 5642
Example: Corticosteroid drug sales
h02 %>%
model(ETS(Cost)) %>%
forecast() %>%
autoplot(h02)

Example: Corticosteroid drug sales
h02 %>%
model(
auto = ETS(Cost),
AAA = ETS(Cost ~ error("A") + trend("A") + season("A"))
) %>%
accuracy()
| auto |
38649 |
51102 |
4.99 |
0.638 |
0.689 |
| AAA |
43378 |
56784 |
6.05 |
0.716 |
0.766 |